Semi - supervised Learning Methods for Data Augmentation
نویسندگان
چکیده
The original goal of this project was to investigate the extent to which data augmentation schemes based on semi-supervised learning algorithms can improve classification accuracy in supervised learning problems. The objectives included determining the appropriate algorithms, customising them for the purposes of this project and providing their Matlab implementations. These algorithms were to be used to develop a robust system for achieving data augmentation in arbitrary application areas. For evaluation purposes, a general framework for assessing the quality of data augmentation achieved was to be constructed. The project met and exceeded all of the success criteria. A survey of theoretical results underlying data augmentation has been conducted. Full, general implementations of Bayesian Sets, Spy-EM and Roc-SVM algorithms, as well as their proposed extensions have been implemented. A general scheme for achieving data augmentation in binary and multi-class classification has been developed and successfully applied to the three application areas proposed. An evaluation framework for assessing the quality of data augmentation was implemented and used to give statistical significance to the results obtained. i Special Difficulties None. ii Declaration I, Nikola Mrksic of Trinity College, being a candidate for Part II of the Computer Science Tripos, hereby declare that this dissertation and the work described in it are my own work, unaided except as may be specified below, and that the disser-tation does not contain material that has already been used to any substantial extent for a comparable purpose. I give permission for my dissertation to be made available in the archive area of the Laboratory's website.
منابع مشابه
A Constrained Semi-supervised Learning Approach to Data Association
Data association (obtaining correspondences) is a ubiquitous problem in computer vision. It appears when matching image features across multiple images, matching image features to object recognition models and matching image features to semantic concepts. In this paper, we show how a wide class of data association tasks arising in computer vision can be interpreted as a constrained semi-supervi...
متن کاملRegularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
Effective convolutional neural networks are trained on large sets of labeled data. However, creating large labeled datasets is a very costly and time-consuming task. Semi-supervised learning uses unlabeled data to train a model with higher accuracy when there is a limited set of labeled data available. In this paper, we consider the problem of semi-supervised learning with convolutional neural ...
متن کاملPainless Semi-Supervised Morphological Segmentation using Conditional Random Fields
We discuss data-driven morphological segmentation, in which word forms are segmented into morphs, that is the surface forms of morphemes. We extend a recent segmentation approach based on conditional random fields from purely supervised to semi-supervised learning by exploiting available unsupervised segmentation techniques. We integrate the unsupervised techniques into the conditional random f...
متن کاملLarge Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space redu...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013